-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
api: support to query whether pd has loaded region #8749
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: lhy1024 <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8749 +/- ##
==========================================
- Coverage 70.10% 69.63% -0.48%
==========================================
Files 517 516 -1
Lines 79983 79706 -277
==========================================
- Hits 56076 55500 -576
- Misses 20339 20694 +355
+ Partials 3568 3512 -56
Flags with carried forward coverage won't be shown. Click here to find out more. |
pkg/storage/storage.go
Outdated
regionLoadedFromDefault bool | ||
regionLoadedFromStorage bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to distinguish these two kinds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to avoid problems after ps.useRegionStorage switch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using string?
regionLoadedFrom
: "", "etcd", "leveldb"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, we just have these two ways now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer just using a single field to store this info like:
type regionSource int
const (
unloaded regionSource = iota
etcd
leveldb
)
Signed-off-by: lhy1024 <[email protected]>
server/api/status.go
Outdated
@@ -39,11 +40,13 @@ func newStatusHandler(svr *server.Server, rd *render.Render) *statusHandler { | |||
// @Success 200 {object} versioninfo.Status | |||
// @Router /status [get] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure whether it is suitable. ref #8748 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about /member or /health?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/members return all member info according etcd.ListEtcdMembers
/health is to query the status of each node through the /ping interface to return information about all the nodes, if we don't add a new api, we won't be able to get the result of whether the load region is complete or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
status is more like some fixed information to me, at least for the current definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put it under /status, which is appropriate from the name, but currently status returns compiled information.
Yes, as I said in the issue.
pkg/versioninfo/versioninfo.go
Outdated
Version string `json:"version"` | ||
GitHash string `json:"git_hash"` | ||
StartTimestamp int64 `json:"start_timestamp"` | ||
AreRegionsLoaded bool `json:"are_regions_loaded"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer to use loaded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loaded bool
json:"loaded"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
pkg/versioninfo/versioninfo.go
Outdated
@@ -31,6 +31,7 @@ type Status struct { | |||
Version string `json:"version"` | |||
GitHash string `json:"git_hash"` | |||
StartTimestamp int64 `json:"start_timestamp"` | |||
Loaded bool `json:"loaded"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer RegionLoaded
pkg/storage/storage.go
Outdated
regionLoadedFromDefault bool | ||
regionLoadedFromStorage bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using string?
regionLoadedFrom
: "", "etcd", "leveldb"
pkg/storage/storage.go
Outdated
if atomic.LoadInt32(&ps.useRegionStorage) == 0 { | ||
return ps.Storage.LoadRegions(ctx, f) | ||
err := ps.Storage.LoadRegions(ctx, f) | ||
if err == nil { | ||
ps.regionLoadedFromDefault = true | ||
} | ||
return err | ||
} | ||
|
||
ps.mu.Lock() | ||
defer ps.mu.Unlock() | ||
if !ps.regionLoaded { | ||
if !ps.regionLoadedFromStorage { | ||
if err := ps.regionStorage.LoadRegions(ctx, f); err != nil { | ||
return err | ||
} | ||
ps.regionLoaded = true | ||
ps.regionLoadedFromStorage = true | ||
} | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we can use *coreStorage.LoadRegions()
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't check ps.regionLoadedFromStorage
Signed-off-by: lhy1024 <[email protected]>
Signed-off-by: lhy1024 <[email protected]>
pkg/storage/storage.go
Outdated
useRegionStorage int32 | ||
regionLoaded bool | ||
mu syncutil.Mutex | ||
useRegionStorage int32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that safe to switch between using or not using RegionStore in the master branch or the latest releases now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defaultUseRegionStorage is true and only check switch when it was just elected leader.
Signed-off-by: lhy1024 <[email protected]>
pkg/storage/storage.go
Outdated
useRegionStorage int32 | ||
regionLoaded bool | ||
mu syncutil.Mutex | ||
useRegionStorageFlag int32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use bool?
Signed-off-by: lhy1024 <[email protected]>
/test pull-integration-realcluster-test |
@@ -44,6 +45,7 @@ func (h *statusHandler) GetPDStatus(w http.ResponseWriter, _ *http.Request) { | |||
GitHash: versioninfo.PDGitHash, | |||
Version: versioninfo.PDReleaseVersion, | |||
StartTimestamp: h.svr.StartTimestamp(), | |||
RegionLoaded: storage.AreRegionsLoaded(h.svr.GetStorage()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a more general field here like "Ready" which we can extend in the future if we find more conditions under which PD can't serve?
Does PDStatus API exposed by all components of disaggregated PD?
Any chance we introduce a new API /ready which we can use across all TiDB components in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Tema, thanks for your advice
Q1: A general field will be helpful to expand different scenarios, but we found that currently there is only RegionLoaded is needed. If there are other situations in the future, we tend to add new fields with more accurate meanings.
Q2: Standalone PD or Scheduling Service will expose this API(especially the RegionLoaded field), others such as TSO Service don't.
Q3: IMO, it's good we can use the same interface to query service status, but the fields may be different. The benefits of unification may not be obvious, but it is indeed a better way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally you want to decouple data plane (pd) from control plane (tidb-operator, tiUp), so that if you make changes to one, you don't have to upgrade other and it can start leveraging any improvements right away. That is why usually the common convention is to use /ready
across all components. This way the control-plane can have a common logic for rolling restart of all components, without need to necessarily overcustomize each of them. This is not just about PD microservices but many other components of TiDB cluster (tidb, dm, cdc, tikv). Some of them still require additional customization, but it would be nice to keep it to the minimum. Did you talk to tidb-operator team? Don't they want to have a /ready
across all components. I've herd there is a work on tidb-operator v2 or something like that. This might be a good opportunity to standartize protocol between control plane and data place as once it is out there, it is hard to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL @csuzhangxc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Tema, for TiDB Operator, it's better to use a general API (like /ready
) to check the status in most cases. If this API can't meet some other special cases in the future, then we can consider to add/use some other APIs.
For this RegionLoaded case, in fact, TiDB Operator just wants to know when it can restart the next instance safely.
What problem does this PR solve?
Issue Number: Close #8748
What is changed and how does it work?
Check List
Tests
Release note